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Abstract 

The posttranslational modification of proteins by the ubiquitination pathway is an important regulatory mechanism in eukaryotes. To 
date, however, studies on the evolutionary history of the proteins involved in this pathway have been restricted to E1 and E2 enzymes, 
whereas E3 studies have been focused mainly in metazoans and plants. To have a wider perspective, here we perform a genomic 
survey of the HECT family of E3 ubiquitin-protein ligases, an important part of this posttranslational pathway, in genomes from 
representatives of all major eukaryotic lineages. We classify eukaryotic HECTs and reconstruct, by phylogenetic analysis, the putative 
repertoire of these proteins in the last eukaryotic common ancestor (LECA). Furthermore, we analyze the diversity and complexity of 
protein domain architectures of HECTs along the different extant eukaryotic lineages. Our data show that LECA had six different 
HECTs and that protein expansion and N-terminal domain diversification shaped HECTevolution. Our data reveal that the genomes of 
animals and unicellular holozoans considerably increased the molecular and functional diversity of their HECT system compared with 
other eukaryotes. Other eukaryotes, such as the Apusozoa Thecanomas trahens or the Heterokonta Phytophthora infestans, inde- 
pendently expanded their HECT repertoire. In contrast, plant, excavate, rhodophyte, chlorophyte, and fungal genomes have a more 
limited enzymatic repertoire. Our genomic survey and phylogenetic analysis clarifies the origin and evolution of different HECT 
families among eukaryotes and provides a useful phylogenetic framework for future evolutionary studies of this regulatory pathway. 

Key words: ubiquitination pathway, posttranslational regulation, multicellularity, last common ancestor of eukaryotes, Holozoa. 



Introduction 

Proteins are the main structural and functional components of 
all cells. To efficiently respond to different environmental con- 
ditions, the protein levels need to be constantly regulated. The 
ubiquitination pathway is one of the most important post- 
translational mechanisms for regulating protein turnover and 
molecular cell dynamics (Rotin and Kumar 2009). It is based on 
the posttranslational modification of proteins by the ligation of 
ubiquitin, a 76 amino acid signaling peptide that is conserved 
across eukaryotes. This ubiquitin flag targets the proteins to a 
number of different outcomes, such as protein degradation, 
membrane sorting, and signaling functions (Rotin and Kumar 
2009). The ubiquitination pathway involves the sequential 



transfer of activated ubiquitin (Ub) from E1 (ubiquitin activat- 
ing enzyme) to E2 (ubiquitin conjugating enzyme), and sub- 
sequently from E2 to E3 (ubiquitin ligase), which binds Ub to 
the protein of interest. E3 ubiquitin ligases transfer Ub to one 
or more Lys residues in the substrate by linking the C-terminal 
Gly of Ub with a Lys of the target protein (and/or a Lys of the 
Ub itself). Ubiquitination can occur in different forms 
(Mukhopadhyay and Riezman 2007): mono-ubiquitination 
(attachment of a single Ub to a single Lys), multi-ubiquitination 
(several Lys residues tagged with Ub) and polyubiquitination 
(addition of a Ub chain to a single Lys of the target protein). 
Typically, mono- and multi-ubiquitination are related to sub- 
cellular localization processes such as the secretory and endo- 
cytic pathways (Hicke 2001). Polyubiquitination, on the other 
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hand, directs proteins to the 26S proteasome (a multiprotein 
complex consisting of 19S regulatory and 20S catalytic sub- 
complexes), which recognizes ubiquitinated proteins and de- 
grades them; a common fate for misfolded or damaged 
proteins (Pickart and Fushman 2004). 

To date, several studies have been carried out to resolve the 
evolutionary history of the ubiquitination pathway from a pan- 
eukaryotic point of view. These studies have, however, fo- 
cused on the most conserved elements of the system, that 
is, the E1 (Burroughs et al. 2009) and E2 enzymes (Burroughs 
et al. 2008; Michelle et al. 2009; Ying et al. 2009), revealing 
that this pathway is ancient and widely distributed in all the 
considered eukaryotic lineages — as it is also the case for the 
ubiquitin proteins themselves (Burroughs et al. 2007). 

Conversely, most studies on E3 ubiquitin ligases have fo- 
cused mainly on animals (Rotin and Kumar 2009; Mann 201 0) 
and plants (Downes et al. 2003); and so little is known about 
the origin and evolution of these ligases within eukaryotes, 
and their relative importance in different eukaryotic lineages. 

E3 ubiquitin ligases are of particular interest in evolutionary 
studies of the ubiquitination system, because they are way 
more diversified than E1 and E2 enzymes. The reason for 
this is that they are responsible for the specificity of the ubi- 
quitination system, that is, they recognize, discriminate, and 
interact with the proper protein substrate (Rotin and Kumar 
2009), and therefore are more functionally specialized. In fact, 
there are various groups of E3 enzymes according to their 
quaternary structure, their specific domain arrangements 
and the way in which they interact with E2 and the target 
protein. This includes, for instance, the HECTand RING ligases, 
and the CRL complexes. These proteins typically have a wide 
range of domain architectures involving specific protein-pro- 
tein interaction motifs. 

Indeed, the few eukaryotic genomes so far analyzed often 
encode many more E3 enzymes than E1 or E2. For example, 
there are more than 600 types of E3 in the human genome, 
whereas there are only two E1 proteins and approximately 30 
E2 proteins (Schwartz and Ciechanover 2009). 

HECT proteins are defined by the specific HECT domain, a 
C-terminal domain of approximately 350 amino acids that is 
essential for their Ub-ligase activity. The HECT domain is ex- 
clusive to HECT E3 ligases and is widespread among eukary- 
otes (Punta et al. 201 2). HECT proteins directly intervene in the 
ligation process by forming an intermediate thioester bond 
between a highly conserved cysteine residue and Ub that 
binds Ub to the substrate (fig. 1) (Rotin and Kumar 2009). 

Previous studies have devised a phylogenetic classification 
of animal HECTs (Mann 2010); however, there is little knowl- 
edge on the diversity of HECTs among all eukaryotes. Here, 
we perform a genomic survey of HECT ligases in eukaryotes 
and provide a useful evolutionary framework for future anal- 
yses. We also analyze the diversity of protein domain architec- 
tures of HECTs along the different eukaryotic lineages, as well 
as the putative relationship between the expansion of the 



Fig. 1. — Schematic representation of Ub ligation to a protein sub- 
strate with a HECT ligase. The ligation process involves transferring the 
Ub from an activating enzyme (E1) to a transferase (E2) and then to the 
HECT ligase (E3). The E3 then ligates the Ub to a Lys residue of the sub- 
strate (S) with a thioester bond, involving a Cys residue in the HECT 
enzyme itself. 



HECT-dependent ubiquitination system and the origin of mul- 
ticellularity in several eukaryotic clades. 

Materials and Methods 

Taxon Sampling and Sequence Retrieval 

HECT sequences were obtained from sequence data from 
complete genome sequences of 44 taxa, which represented 
all the recognized eukaryotic supergroups. Taxon sampling 
included 9 animals, 5 unicellular Holozoa, 8 Fungi, 1 Apuso- 
zoa, 3 Amoebozoa, 3 plants, 5 unicellular algae, 3 Hetero- 
konta (1 being multicellular), and 1 1 other unicellular Bikonta 
(see supplementary table S1, Supplementary Material online). 
HECT amino acid sequences were retrieved with a HMMER 
search, using the HMM profile of the Pfam HECT domain 
entry (PF00632) as a query, the default parameters and an 
inclusive E value of 0.05. The search yielded 744 sequences 
(see supplementary fig. S3, Supplementary Material online). 

Protein Alignment, Manual Edition, and Data Curation 

The retrieved sequences were aligned using Mafft (Katoh et al. 
2002) L-INS-i algorithm (optimized for local sequence homol- 
ogy [Katoh et al. 2005]). The alignment was further edited 
manually and hits fulfilling one of the following conditions 
were removed: 1) incomplete sequences with more than 
99% of sequence similarity with a complete sequence from 
the same taxa, and 2) sequences that showed extreme long 
branches in the preliminary maximum likelihood (ML) trees. 
The final alignment was carried out based on the HECT 
domain alone using the Mafft G-INS-i algorithm (for global 
homology). 

Phylogenetic Analyses 

The phylogenetic trees of eukaryotic HECTs were inferred 
from both ML and Bayesian inference (Bl) analyses, using 
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the LG evolutionary model with a discrete gamma distribution 
of among-site variation rates (four categories) and a propor- 
tion of invariable sites, which constituted the best model for 
this data set, according to Prottest (Abascal et al. 2005). 

ML trees were estimated with RAxML 7.2.6 (Pthreads ver- 
sion [Stamatakis 2006]) and the best tree from 100 replicates 
was selected. Bootstrap support (BS) was calculated from 500 
replicates. Bl trees were estimated with Phylobayes 3.3 
(Lartillot et al. 2009), using two parallel runs for 500,000 gen- 
erations and sampling every 100. Bayesian posterior probabil- 
ities (BPPs) were used for assessing the statistical support of 
each bipartition. 

Domain Architecture Analysis 

The N-terminal domain architecture of all retrieved sequences 
was inferred by performing a Pfam scan (Punta et al. 2012), 
using the gathering threshold as cut-off value. The domain 
information of each protein was used to 1 ) assess the reliability 
of each sequence of the initial data set, 2) help define protein 
families according to its architectural coherence, and 3) assess 
the level of functional and architectural diversification of HECT 
proteins across the eukaryote lineages. Additional information 
about some previously uncharacterized domain architectures 
was obtained from the bibliography and verified using manual 
protein alignments. The pattern of acquisition of new domains 
at the N-terminus of HECT proteins across the eukaryote tree 
of life was inferred using a strict parsimony approach based on 
phylogenetic information from Bl and ML trees. 

Classification Criteria 

The classification of the HECT proteins is based on two hier- 
archical categories: 1) protein families, which contain all pro- 
teins from orthologous genes with high nodal support, and 2) 
protein classes with one or more families, which are wider 
groups of phylogenetically related families that descend 
from one of the HECT proteins that have been inferred to 
exist in the last eukaryote common ancestor (LECA). Protein 
families sometimes share a common domain architecture, and 
therefore the domain content of each protein was used as an 
additional, conditional criterion to define some families. The 
pattern of gain and loss of families was inferred by strict par- 
simony based on phylogenetic information from Bl and ML 
trees. 

Results and Discussion 

The Evolutionary Origin of HECT E3 Protein Family 

Our phylogenetic analyses recovered six pan-eukaryotic clades 
of HECT proteins, defined as classes I to VI (figs. 2 and 3). 
Assuming the leading hypothesis that the root of eukaryotes 
lies between Unikonta and Bikonta (Stechmann and Cavalier- 
Smith 2002; Derelle and Lang 2012), our data imply that the 
last eukaryotic common ancestor had at least six HECTs that 



remain present in diverse eukaryotic lineages. In turn, these six 
main classes are divided into 35 distinct HECT families that are 
specific to certain eukaryotic lineages (fig. 3). This scenario 
remains the same if the alternative "Excavate-first" hypothesis 
of the root of the eukaryotes is considered (Rodnguez- 
Ezpeleta et al. 2007). 

The diversification of each class involves many gene dupli- 
cation events and secondary losses (fig. 4), as well as the ac- 
quisition of new accessory domains. Our data show that the 
protein domain architecture is quite diverse as a result of 
domain rearrangements and the acquisition of new domains 
at the N-terminal region (fig. 3). 

Remarkably, domain fusions at the C terminus have not 
been detected in any of the analyzed organisms. This might 
be explained by the fact that the catalytic activity of the HECT 
domain strongly depends on its tertiary structure: all HECTs 
are organized in two structurally distinct lobes (N-lobe and C- 
lobe, where HECT is located) that can adopt a limited range of 
three-dimensional conformations (Huang et al. 1999; Verde- 
cia et al. 2003; Rotin and Kumar 2009). This tertiary structure 
is functionally relevant (and therefore constrained) because it 
defines the position of the catalytic cysteine residue with re- 
spect to the E2 enzyme and the ubiquitination substrate 
during the ligation process (Verdecia et al. 2003). It also de- 
termines the way in which the ubiquitin chain elongation 
occurs (Maspero et al. 201 1). 

Assuming the "Unikont-Bikont split" hypothesis on the 
root of eukaryotes (Stechmann and Cavalier-Smith 2002; 
Derelle and Lang 2012), the analysis of protein domain 
architectures reveals class-specific N-terminal domain arrange- 
ments that are pan-eukaryotically distributed in classes I 
(SPRY), V (IQ), and VI (DUF908, DUF913, UBA, and 
DUF4414), whereas the founding proteins of classes II, III, 
and IV (Rodnguez-Ezpeleta et al. 2007), a similar scenario 
emerges, except for the ancestral IQ (class V), DUF908, 
DUF913, and UBA domains (class VI), which are not recov- 
ered. However, DUF4414 (class VI) still appears to be present 
in the LECA. 

The syntax of N-terminal domain architectures in HECTs is 
mainly based on protein recognition motifs (IQ, WW, Ankyrin 
repeats, zinc fingers, etc.) that enable HECTs to specifically 
ubiquitinate certain substrates. Domains involved in targeting 
the HECT enzyme to certain molecules are also common, such 
as C2 (lipid binding), Laminin-G3 (complex sugar binding), and 
PABP (mRNA polyadenylate binding). Some of these motifs 
are especially "promiscuous" and have been independently 
gained several times thorough HECT evolution (for instance, 
ubiquitin-binding UBA and protein-binding domains such as 
WWE, SPRY, RCd-like domain [RLD], Ankyrin, and MIB- 
HERC2) (fig. 5; details discussed later). Despite the generally 
conserved syntax of HECT N-terminal architectures, rare do- 
mains with no clear function exist on some uncharacterized 
HECTs. It is expected that the discovery of such unusual HECTs 
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Fig. 2. — Bl phylogenetic tree of HECT proteins inferred from an alignment of the HECT domain (220 amino acidic positions). Colored clades indicate 
classes; collapsed clades indicate families (in regular text) and other clades of interest (italics). Nodal labels indicate BPP and 500-replicate ML BS values, 
respectively. Dashes indicate that the node is not recovered. Six pan-eukaryotic classes can be distinguished, with 35 families within these. For each class, the 
putative ancestral N-terminal architecture is shown. Complete Bl and ML trees are shown in supplementary figures S1 and S2, Supplementary Material online. 
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will increase when more and more genomes are taken into 
account in future similar surveys. 

Classification of Eukaryotic HECT E3 Ligases 

We have classified the different eukaryotic HECTs in different 
classes and families, according to the topology obtained by 
the phylogenetic analyses. A description of the main charac- 
teristics of each class and family is given in the following 
section. 

Class I: Large HERCs and Related Families 

Class I contains seven protein families: HERC1, HERC2 (both 
known as large HERCs), KIAA0614, HECTD3, HECTXExl, 
HETCAml, and HECTHel (figs. 2 and 3). The monophyly of 
class I is supported by a BPP of 1.0 and a BS value of 89% 
(fig. 2). Large HERCs were previously thought to be related to 
the family of small HERCs (class III in our tree), because they 



shared the RLD (Hadjebi et al. 2008), but our data corroborate 
that these families are paraphyletic and the domains have 
been independently acquired (Gong et al. 2003; Mann 201 0). 

HERC1 is an animal-specific family that has been lost in 
Arthropoda (Daphnia pulex and Drosophila melanogaster) 
and Hemichordata (Saccoglossus kowalevskii). HERC1 pro- 
teins have a specific domain architecture consisting of HECT, 
two RLDs, SPRY, and a variable number of WD40 repeats. In 
some cases, there is also a UBA domain. In humans, HERC1 
binds to clathrin heavy chain and has GEF activity on ARF1, a 
GTPase involved in membrane trafficking in the Golgi appa- 
ratus (Rosa and Casaroli-Marano 1996). HERC1 also ubiquiti- 
nates the tumor suppressor TSC2 (involved in the tuberous 
sclerosis complex disease and perhaps in membrane traffick- 
ing [Chong-Kopera et al. 2006]). 

The HERC2 family, which appears as a sister group to 
HERC1, is closely related to HERC1 and includes proteins 
from both Metazoa and Choanoflagellata. In mammals, 
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HERC2s ubiquitinate and target BRCA1 (breast cancer sup- 
pressor) for degradation (Wu et al. 2010). They have a com- 
plex domain architecture with two RLDs and several protein 
recognition motifs: Cyt-b5 (Ozols 1989), MIB-HERC2 (also 
present in RING E3 Mib2 [Itoh et al. 2003]), Cul7 (present in 
RING E3s Cul7 [Kaustov et al. 2007]), ZZ, and APC10. This 
architectural diversification occurred at the origin of the 
Metazoa, since the choanoflagellate homologs from both 
Monosiga brevicollis and Salpingoeca rosetta have simpler ar- 
chitectures (RLD repeats and RLD, APC10, and SPRY domains, 
respectively). 

The KIAA061 4 family is a pan-eukaryotic family with homo- 
logs in Metazoa, Choanoflagellata, Heterokonta, Alveolata, 



Rhizaria, and Haptophyta. Some proteins have a SPRY 
domain, while proteins from Phytophthora infestans and 
Tetrahymena thermophila have an extra zf-RanBP. 

The HECTD3 family contains animal proteins (bearing an 
APC10 domain) and a homolog from Acanthamoeba castel- 
lanii. Human HECTD3 ubiquitinates some proteins involved in 
neural development and brain function, such as Syntaxin-8 
(Zhang et al. 2009) and Tara — which is also a regulator of 
cell growth, cytoskeletal actin reorganization and cell motility 
(Yu etal. 2008). 

HERC2 and HECTD3 are the only HECT families with APC10 
domains, and they both are exclusive to animals and choano- 
f lagellates. APC 1 0 domain is also found in the RING E3 APC/C 
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complex, which takes part in cell cycle control by regulating 
mitosis (Jin et al. 2008). In this context, APC10 is responsible 
for the regulation of substrate binding (Peters 2002). 

The other families within this class (i.e., HECTExl, 
HECTAml, and HECTHel) are named after their taxonomic 
content (Excavata, Amoebozoa, and Heterokonta) and are 
defined by their distinctive domain arrangements. For in- 
stance, HECTAml contains PH and SPRY motifs, and 
HECTHel and HECTExl have Laminin-G3 (capable to revers- 
ibly bind to specific complex sugars, an exclusive feature of 
these two families) and SPRY domains. Also, class I contains a 
clade with Thecanomas trahens proteins bearing various pro- 
tein recognition domains that seem to have been indepen- 
dently acquired (fig. 2). 

The SPRY domain is exclusive to class I HECTs and is present 
in most of its families, which suggests that it could have ex- 
isted in the ancestral LECA protein that gave rise to this class. It 
has been reported that SPRY plays a role in the recognition of 
ubiquitination substrates (Nishiya et al. 201 1). 

Class II 

The well-supported class II (BPP = 1 .0; BS = 89%) is composed 
of four protein families: HECTD1, HECTHe2, UPL3/4, and 
Trip12 (figs. 2 and 3). 

The HECTD1 family contains sequences from Metazoa and 
Choanoflagellata. They have a distinctive protein domain 
arrangement containing Sad1-UNC, MIB-HERC2 domains 
and, in some cases, Ankyrin repeats. Human HECTD1 poly- 
ubiquitinates Hsp90, a chaperone that controls cell motility, 
which is essential in brain development (Sarkar and Zohn 
2011). The HECTHe2 family also contains proteins with 
Ankyrin repeats and is specific to Heterokonta, Cryptophyta, 
and Haptophyta. Their functions are still unknown. 

Trip12 (also known as ULF) includes proteins from animals, 
unicellular Holozoa and Fungi. Animal Trip12s are defined by 
two protein recognition domains: HEAT repeats, which are 
Armadillo-like motifs that recognize ubiquitin degradation sig- 
nals in E3s substrates (Tewari et al. 2010); and WWE, which 
recognizes the Ankyrin motif of Notch and ligand-binding 
domains of other proteins (Aravind 2001). Fungal Trip12s 
also have HEAT/Armadillo repeats with a similar function, 
for example, the yeast Ufd4 HECT (Tewari et al. 2010). 

Trip12 activity hampers tumor suppression in humans by 
preventing the p53 response to oncogenic events: it pro- 
motes the degradation of ARF, an inhibitor of the RING E3 
Mdm2 (which in turn targets p53 for degradation [Brooks 
and Gu 2006]). Trip12 also targets p16 (a murine negative 
cell cycle regulator during embryogenesis) to degradation 
(Kajiro etal. 2011). 

The UPL3/4 family includes homologs from several Bikonta 
clades (Viridiplantae, Excavata, Cryptophyta, Haptophyta, and 
Rhodophyta). Some Viridiplantae proteins also have Armadillo 
repeats, which have been predicted to recognize nuclear 



localization signals (Downes et al. 2003). Arabidopsis UPL3 
polyubiquitinates some unknown regulator of trichome devel- 
opment (Downes et al. 2003); and both UPL3 and UPL4 col- 
laborate in the regulation of Gibberellin cell signaling (Coates 
2008). However, concrete substrates remain elusive. 

Class III: Small HERCs, E6AP, and Other Families 

Class III (BPP = 1 .0; BS = 88%) includes small HERCs, HECTD2, 
E6AP (all of them named after the human proteins within 
them), and HECTX (Mann 2010) composed of Unikonta pro- 
teins. However, class III also includes proteins from Bikonta 
species (Viridiplantae, SAR, Cryptophyta, Haptophyta, and 
Excavata) that cannot confidently be assigned to any family, 
branching in an unclear position related to HECTD2, E6AP, 
and HECTX, but with low nodal supports. 

The family of small HERCs includes proteins from animals, 
Choanoflagellata and Filasterea clades. It embodies human 
proteins HERC3, 4, 5, and 6, that is, the remaining HERC 
proteins that were formerly considered to be closely related 
to large HERCs 1 and 2 (see class I). So, any a priori functional 
or evolutionary similarities between these families need to be 
re-assessed. For instance, in contrast to large HERCs, the RLD 
motifs from small HERCs do not act as guanine nucleotide 
exchange factors (Rotin and Kumar 2009). 

Indeed, convergent acquisition of RLD domains seems to be 
a common event in HECT evolutionary history: they are also 
present in several non-holozoan "HERC-like" proteins that 
cannot be assigned to any specific family (A. castellanii, 
Toxoplasma gondii, Ectocarpus siliculosus, Cyanidioschyzon 
merolae, and Emiliania huxleyi from class III; and P. infestans 
from class I). RLD domains intervene in a wide variety of cellular 
processes (RNA processing and transport, RNA mating, imita- 
tion of mitosis, chromatin condensation, guanine-nucleotide- 
exchange factor, protein recognition in DNA binding, and 
ubiquitination), which could explain their high "promiscuity." 

Human small HERCs have important functions. For exam- 
ple, HERC3 binds Ub, PLIC1, or PLIC2 (Ub-like proteins) to 
endocytic proteins, thus regulating vesicular transport (Cruz 
et al. 2001). HERC4 is essential for spermatogenesis in mice 
(Cruz et al. 2001), and HERC5 is involved in the immune 
response related to interferon signaling pathways and poly- 
ubiquitinates IkB (inhibitor of the pro-inflammatory transcrip- 
tion factor NF-kB) (Kroismayr et al. 2004; Dastur et al. 2006). 

The E6AP family (also known as E3A or UBE3A) includes 
the human protein E6AP (one of the first described HECTs), as 
well as proteins from animals, Capsaspora owczarzaki, 
Sphaeroforma arctica, and Mortierella verticillata, although 
the latter has poor nodal support. Human E6AP is known 
for its role in the inactivation of tumor suppressor p53 through 
proteasomal degradation (Scheffner 1998). E6AP is a good 
example of complex interplay between E3, in which different 
E3s have different antagonistic roles. For instance, human 
E6AP is polyubiquitinated by UBR5/EDD (another HECT E3, 
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discussed later) (Tomaic et al. 2011), as well as being en- 
hanced (in an ubiquitin-independent manner) by HERC2 
(Kuhnle et al. 2011). 

The HECTD2 family is an Opisthokonta-specific family that 
includes sequences from animals and Fungi, but not from 
unicellular Holozoa. HECTD2 proteins have a single HECT 
domain. Murine and human HECTD2 are known to intervene 
in protein degradation in neurodegeneration processes (Lloyd 
etal. 2009). 

HECTX contains proteins from Cnidaria and Placozoa pro- 
teins, as well as from Filasterea, Fungi, and Amoebozoa. Thus, 
the lack of HECTX in bilaterians genomes is probably due to a 
secondary loss. 

Class IV 

Class IV includes four families: UBR5/EDD, G2E3, GL-Metazoa, 
and GL-Bikonta. The latter three are extremely divergent at 
the sequence level (figs. 2 and 3). The nodal support for this 
class is weak (fig. 2), but both Bayesian and ML analyses re- 
covered the clade. In contrast, the nodal support for all of the 
families, except GL-Bikonta, is very good (BPP=1.0 and 
BS = 99-100%). 

The UBR5/EDD family includes proteins from animals 
(which have an EDD domain for binding ubiquitn, a zf-UBR 
protein recognition motif and a PABP domain) and architec- 
turally simpler homologs from the choanoflagellate Sal. 
rosetta and the filasterean Cap. owczarzaki. Human EDD 
and Dro. melanogaster HYD act as general tumor suppressors 
by ubiquitinating E6AP (Tomaic et al. 201 1), which increases 
p53 levels and induces cell senescence (Smits 2012). EDD and 
HYD also ubiquitinate TopBPI (a topo-isomerase that inter- 
venes in DNA damage response [Honda et al. 2002]) and 
negatively regulate Hh (hedgehog pathway) and Dpp (deca- 
pentaplegic pathway) expression, two crucial elements in the 
Drosophila eye disc development process (Lee 2002). 

The G2E3, GL-Metazoa, and GL-Bikonta families are com- 
posed of proteins with a highly divergent HECT domain, with 
different domain arrangements that could confer them their 
own functional specificities. For instance, some proteins from 
Naegleria gruberi and E. siliculosus (GL-Bikonta) have unusual 
protein kinase domains of unknown function; and human and 
murine G2E3s have a non-functional HECT domain and three 
unconventional RING/PHD-like zinc fingers, two of which have 
been proved to have ubiquitin ligase activity (Brooks et al. 
2008). None of these zinc fingers has been clearly classified 
as either PHD or RING motifs, although Pfam identifies the 
noncatalytically active one as a PHD-like zf-HC5HC2H domain 
(which is consistent with the fact that PHD domains are unable 
to act as ubiquitin ligases [Scheel and Hofmann 2003]). The 
lack of functional constraints on the HECT sequence would 
explain its divergence from other HECT proteins. 

The most parsimonious explanation for the evolution of 
class IV is that an ancestral LECA gene underwent a 



duplication that gave rise to 1) the holozoan EDD family (sec- 
ondarily lost in Bikonta species), and 2) a fast-evolving group, 
including the G2E3, GL-Metazoa, and GL-Bikonta families. 

Class V 

Class V (BPP = 1 .0; BS = 96%) contains five families with pro- 
teins from Unikonta and Bikonta: UBE3B, UBE3C, HECTFu2, 
UPL6, and UPL7 (figs. 2 and 3). Except for HECTFu2, proteins 
belonging to this class have an exclusive IQ domain that could 
have been present in the ancestral protein that gave rise to 
class V. IQ typically binds to calmodulin and is also present in 
proteins that interact with GTP regulatory and cell cycle pro- 
teins, receptors, and channel proteins (Rhoads and Friedberg 
1997). 

UBE3B is an Opisthokonta-wide family in which an IQ 
domain is present in some proteins from animals, Filasterea 
(Cap. owczarzaki) and Fungi (M. verticillata). Proteins from the 
animal family UBE3C also have an IQ domain. UBE3B is 
thought to play a role in the oxidative stress response in 
humans and Caenorhabditis elegans (Oeda et al. 2001), and 
UBE3C plays an undetermined role in inflammatory responses 
in the human airways, probably related to IkB ubiquitination 
(Pasaje etal. 2011). 

The HECTFu2 family, defined here for the first time, is 
specific to Fungi and their proteins do not bear any particular 
N-terminal protein domain architecture. It has no known 
substrates. 

The UPL6 and UPL7 families conform to two independent 
clades, both consisting of Embryophyta and Chlorophyta pro- 
teins. UPL7 also contains proteins from Alveolata and 
Heterokonta. Again, IQ domains are found in Embryophyta 
and Chlorophyta sequences from UPL7 and Embryophyta se- 
quences from UPL6. Contrary to previous studies (Gong et al. 
2003), we did not recover a sister-group relationship between 
UPL6 and UPL7. 

Class VI: Nedd4-Like, HUWE1, HACE1, and Other Families 

Class VI is a wide group that includes 13 families plus three 
unclassified clades (figs. 2 and 3). The Bayesian analysis pro- 
vides a good nodal support for this class (BPP = 0.99), but the 
clade is not statistically supported by ML. 

The Nedd4-like group contains all families with C2 and 
WW domains: HECW/NEDL (with 1-2 WWs; specific to ani- 
mals) Nedd4, WWP-ltchy and Smurf (with 2-4 WWs; specific 
to Holozoa). This group also contains two unclassified clades 
consisting of apusozoan and fungal proteins (with the same 
protein domain architecture) and a clade with proteins from 
unicellular Holozoa (with its own domain arrangement con- 
sisting of C2 and a CCCH zinc finger). The C2 domain targets 
the enzyme to membranes by binding to lipids (Ponting and 
Parker 1996), whereas WW is a recognition domain that se- 
lectively picks target proteins, typically through PY motifs 
(Chen and Sudol 1995; Macias et al. 2002). 
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A possible explanation for the evolution of this group of 
families involves the assumption that one ancestral homolog 
was present in the genome of the last Apusozoa- 
Opisthokonta common ancestor, which underwent indepen- 
dent diversifications in Apusozoa and Opisthokonta. 

The Nedd4 family includes proteins from all holozoan line- 
ages. In animals, Nedd4s are key downregulators of several 
receptors involved in cell signaling and membrane trafficking. 
For example, Nedd4s are responsible for the ubiquitination 
and stability of the insulin-like growth factor I receptor 
(Vecchione et al. 2003); Dro. melanogaster Nedd4 targets 
Notch receptor for proteasomal degradation (Sakata et al. 
2004); and human Nedd4-1 ubiquitinates EGF (epidermal 
growth factor) receptor and ACK (a tyrosine kinase signaling 
factor) in response to EGF overexpression itself (Lin et al. 
2010). 

The WWP-ltchy family is also specific to Holozoa. It includes 
VWVP1, VVWP2, and Itchy, three human proteins that have 
been studied in depth, as well as Su(dx) from Dro. melanoga- 
ster. WWP-ltchy proteins regulate endosomal sorting and sig- 
naling by polyubiquitinating Notch in humans, mice, and Cae. 
elegans (Qiu et al. 2000; Wilkin et al. 2004; Shaye and 
Greenwald 2005). They also regulate the Hippo pathway: 
WWP1, WWP2, and Itchy polyubiquitinate AMOT (regulator 
of YAP/Yorkie, the central member of the Hippo pathway, 
which is essential for the constitution of a fully functional 
pathway [Sebe-Pedros et al. 2012; Wang et al. 2012]). Itchy 
also polyubiquitinates Warts/Lats, another member of the 
Hippo pathway found in Opisthokonta (Ho et al. 2011). 
Moreover, human Itchy polyubiquitinates the transcription 
factors p63 and p73 (Rossi et al. 2005, 2006). 

Within the Smurf family (present in all holozoan lineages 
except Ichthyosporea), DSmurf {Dro. melanogaster homolog) 
is known to regulate imaginal disc development (Liang et al. 
2003) and embryonic dorsal-ventral patterning (Podos et al. 
2001) by polyubiquitinating MAD (Dpp pathway); and human 
Smurfs (Smurf 1 and 2) are known to antagonize TGF|3 signal- 
ing, and therefore regulate cell growth and proliferation 
(Massague and Gomis 2006). 

The HECW family (or NEDL7Nedd4-like) contains animal 
HECTs, including human proteins NEDL1 (which stabilizes 
p53 in an ubiquitin-independent manner, thereby enhancing 
p53-mediated apoptosis [Li et al. 2008]) and NEDL2 (which 
stabilizes p73 [Miyazaki et al. 2003]). 

The fungal, apusozoan, and unicellular-holozoan Nedd4- 
like clades are incertae sedis. As for the Nedd4-like fungal 
proteins (Fungi clade in fig. 2), only Saccharomyces cerevisiae 
Rsp5p has been characterized: It controls gene expression 
during nutrient limitation-driven stress (Cardona et al. 2009) 
and has various roles in intracellular trafficking (Belgareh- 
Touze et al. 2008), and plasma membrane and cell wall orga- 
nization (Kaminska et al. 2005). None of the Nedd4-like pro- 
teins from the apusozoan and unicellular-holozoan clade 
proteins has been characterized. 



Class VI also includes several families characterized by a 
common domain architecture consisting of DUF908, 
DUF913, and DUF4414 (domains of unknown function). 
These three domains typically co-occur together in HECT pro- 
teins and are evolutionarily conserved in various Unikonta and 
Bikonta lineages, revealing an ancient origin for this group of 
proteins. These include HUWE1, HECTFul (HUWEMike), 
UPL1/2, HECTAI1, and HECTHe3 families. 

The HUWE1 family is named after the human protein 
within it (also known as UREB1, HectH9, KIAA0312, LASU1, 
ARF-BP1, or Mule). HUWE1 proteins have a complex domain 
architecture consisting of DUF908, DUF913, WWE, UBA, and 
DUF4414. It includes representatives from animals, M. brevi- 
collis and Amoebozoa. The M. brevicollis has a single HECT 
domain, but proteins from Amoebozoa have the complete 
arrangement (except WWE). Human HUWE1 polyubiquiti- 
nates Myc (oncoprotein and transcription factor), which is es- 
sential for the transactivation of several Myc target genes, the 
recruitment of co-activator p300 and the induction of cell 
proliferation (Adhikary et al. 2005). It also enhances p53 sta- 
bility by helping ARF inhibit p53 ubiquitination by Mdm2 
(Brooks and Gu 2006), among other functions (Chen et al. 
2005; Zhong et al. 2005; Hall et al. 2007). 

The HECTFul family includes fungal proteins with a 
HUWEMike N-terminal architecture (without WWE), and 
also some specific domains and simpler arrangements. There 
is indirect evidence that Tom1 (a yeast HUWE1-like protein) 
intervenes in Cdc6 posttranslational regulation (Hall et al. 
2007). 

UPL1/2 is a Viridiplantae-specific family that contains 
Embryophyta proteins with the characteristic DUF908- 
DUF913-UBA-DUF4414 N-terminal architecture and green 
algae proteins with a single HECT domain. 

Both the HECTHe3 (present in Heterokonta) and HECTAI1 
(present in Alveolata) families also contain the DUF441 4-HECT 
arrangement. 

Finally, there are four additional families with good nodal 
support and domain coherence within class VI: KIAA0317, 
HACE1, HECTHe4, and UPL5. 

The HACE1 family contains proteins from all holozoan 
clades plus A. castellanii. HACE1 proteins have a variable 
number of Ankyrin repeats (typically two to three) and some- 
times a PHD domain. The ubiquitinating activity of HACE1 
is known to regulate Golgi complex disassembly and reassem- 
bly during mitosis (Tang et al. 2011), and also plays a role 
in various cancer processes (Zhang et al. 2007). The HACE1 
and HUWE1 families were thought to be sister groups 
and, together, to be a sister group to the Nedd4-like 
group of proteins (Mann 2010); however, we did not recover 
such topology, but rather a polytomy of several families 
(fig. 2). 

The KIAA03 17 family is exclusive to Metazoa and Choano- 
flagellata (Sal. rosetta) clades. Most of them have Filamin re- 
peats, which are only found in this family. They have no 
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known substrates, but Filamin is known to mediate protein 
recognition in other proteins and contexts (Ohta et al. 2006). 

The HECTHe4 is specific to Heterokonta and includes 
P. infestans proteins with a distinctive zf-RanBP domain and 
other proteins with a HECT domain. Both ML and Bl analyses 
have linked this family to the Nedd4-like group of proteins, 
but with low statistical support (fig. 2). 

UPL5 is a Bikonta family that includes proteins from 
Viridiplantae (with a Ub domain), as well as from Rhizaria 
{Bigelowiella natans) and Cryptophyta (Guillardia theta) 
clades (with just a HECT domain). Arabidopsis thaliana UPL5 
polyubiquitinates the WRKY53 transcription factor, which 
promotes leaf senescence (Miao and Zentgraf 2010). Ub-like 
domains within E3 enzymes probably allow for the interaction 
of these enzymes with other members of the pathway (Miao 
and Zentgraf 2010). 

The Origins of Multicellularity and the Evolution of the 
HECT E3 System 

As unicellular eukaryotes evolved into multicellular life forms, 
the need for more complex and finely tuned regulation mech- 
anisms increased and met new regulatory requirements re- 
lated to cell proliferation, adhesion, differentiation, ordered 
cell death, and extra/intracellular signaling. Therefore, and 
given that the ubiquitination pathway is an important regula- 
tory layer responsible for key posttranslational modifications 
and protein turnover, one may expect expansions of the ubi- 
quitination toolkit (including the HECT system) at the origin of 
multicellular clades. To ascertain whether this is the case, we 
analyzed the functional and molecular diversity of the HECT 
system in several eukaryote lineages. 

Specifically, we used the relationship between the number 
of HECT proteins and the number of distinct N-terminal 
domain architectures of those proteins as an estimator of 
the diversity of the HECT system in every given genome. 
Our data show that the number of HECT proteins positively 
correlates with the number of distinct N-terminal domain ar- 
chitectures (fig. 6). 

According to this, the HECT system is enriched in animals 
and unicellular Holozoa, the Heterokonta P. infestans and 
E. siliculosus, and the Apusozoa T. trahens. Conversely, 
Fungi, plant, Chlorophyta, Rhodophyta, and Excavata ge- 
nomes are HECT-poor, with fewer proteins and little protein 
domain diversification. It is worth mentioning that some spe- 
cies such as the Rhizaria B. natans and the Haptophyta Emi. 
huxleyi have a high count of HECT proteins but a low degree 
of domain diversification. 

The Apusozoa T. trahens, the sister group to Opisthokonta 
(Torruella et al. 201 2), also shows a relatively rich HECT toolkit, 
much richer than plants and Fungi and similar in complexity to 
those of metazoans. Our data show that there are some HECT 
proteins that independently diversified within T. trahens. For 
instance, class I contains an unclassified T. trahens clade 



whose proteins have independently acquired different protein 
recognition domains (such as SPRY, ZZ, and zf-UBR). Also, the 
well-known Nedd4 group of HECTs dates back to the last 
common ancestor between Opisthokonta and Apusozoa. 
New apusozoan genomes will make it possible to gain further 
insights into the evolution of the HECT system in this lineage. 

The diversity of HECTs in Heterokonta is highly variable. 
Thalassiosira pseudonana has a poor HECT system, whereas 
E. siliculosus (a multicellular brown alga) and especially P. infes- 
tans have a more diversified HECT system comparable with 
that of animals that most likely evolved from a small basal 
toolkit similar to that of Tha. pseudonana, according to the 
present phylogeny. Moreover, both P. infestans and E. silicu- 
losus proteins have convergently acquired several architec- 
tures characteristic of Opisthokonta HECTs. For example, 
P. infestans proteins have recognition domains such as MIB- 
HERC2, UBA, SPRY, or RLD (typical of large HERC families), 
and E. siliculosus proteins have RLD and Kelch repeats. 

Our analyses show that animals have the most expanded 
and diverse HECT system among eukaryotes, and their unicel- 
lular holozoan relatives (Choanoflagellata, Filasterea, and 
Ichthyosporea) have an intermediate diversity of the system 
(fig. 6). This suggests that there was a burst of HECT diversity 
at the onset of Metazoa, but that a relatively complex HECT 
system already existed in the animals' closest unicellular rela- 
tives. Indeed, the origin of most (17 out of 22) HECT families 
containing animal proteins (among those defined in this study) 
pre-dates the origin of animals (fig. 4). Rather, the higher 
degree of diversification of HECT in animals is explained by 
the acquisition of new domains in the N-terminal regions of 
HECTs. Leaving aside the hemichordate S. kowalevskii (a clear 
outlier to the general trend), animals have between 9 and 14 
different HECT architectures, whereas their closest unicellular 
holozoan relatives have between four and nine arrangements. 

The number of families present in each clade provides ad- 
ditional information on the degree of diversification of the 
HECT system in each taxon (fig. 4). For instance, 24 new fam- 
ilies appear at some point during the evolution of the 
Opisthokonta lineage. The Holozoa are the most family-rich 
lineage, with 22 families, 5 of which are specific to Metazoa. 
Also, there are five families present in plants (all of which 
appear either at the origin of Bikonta or Viridiplantae). This 
reveals that in both animals and plants most HECT families 
pre-date the respective origins of multicellularity. 

We also mapped the acquisition of N-terminal domains 
across the tree of eukaryotes (fig. 5). This is a common 
event within each class, and those architectures that appear 
at the base of multicellular clades and their closest unicellular 
relatives are of particular interest. Our data show that the 
acquisition of new domains is a common event in the 
holozoan clade, especially at the root of animals and Choano- 
flagellata (six domains) and at the node leading to Metazoa 
(eight domains). Indeed, there are five families (namely, EDD, 
HECTD3, HUWE1, UBE3B, and HERC2) in which animal 
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proteins have more complex architectures than those found in 
their unicellular relatives' homologs. Conversely, the acquisi- 
tion of specific protein domains in other multicellular lineages 
such as Fungi and Embryophyta is minimal. 

Overall, our data suggest that increases in both N-terminal 
architectural diversification and absolute number of proteins 
have shaped the evolutionary history of HECT ligases in eu- 
karyotes. An increase in the protein number brings molecular 
duplicities that allow sub- or neofunctionalization of HECT 
proteins. N-terminal domain shuffling is a plastic and adapt- 
able evolutionary mechanism that does not require a change 
of gene content. It can account for significant evolutionary 
changes in posttranslational regulation through the adjust- 
ment of substrate specificity and protein localization. Indeed, 
domain shuffling has been acknowledged as an important 
mechanism for explaining the evolution of multidomain pro- 
teins and the appearance of novel proteins, especially regard- 
ing the origin of new proteins in major transitions such as the 
acquisition of multicellularity in animals (Tordai et al. 2005; 
King et al. 2008; Suga et al. 2012). 

It must be noted that HECTs are not the only set of E3 
ligases of the ubiquitin system and they are not equally rele- 
vant in different eukaryotic lineages. This means that HECT- 
poor taxa such as plants or Fungi may not necessarily have a 
poor ubiquitination system. Indeed, Ara. thaliana, with just 
seven HECTs, has expanded their E3 proteins count in terms 



of F-box, RING and U-box ligases (Lespinet et al. 2002), com- 
pared to other eukaryotes. Conversely, E1 and E2 functions 
are each performed by a single type of enzymes. All E1 en- 
zymes descend from a common ancestor that was co-opted 
into ubiquitin activating functions at the origin of eukaryotes, 
and, since then, has undergone duplications in Unikonta, 
Vertebrata, Heterokonta, and Kinetoplastida (Excavata) 
(Burroughs et al. 2009). Similarly, there is just one type of 
E2 enzyme for conjugating ubiquitin, and all (or most of) 
their known families were already present at the LECA 
(Burroughs et al. 2008; Michelle et al. 2009). Altogether, 
this shows that E1 and E2 enzymes radiated concomitantly 
prior to the LECA, when they were recruited for the ubiquiti- 
nation pathway (Burroughs et al. 2008). 

This pattern of evolution is markedly different from that 
showed by HECTs (in this study) and other E3 enzymes 
(Lespinet et al. 2002), which have undergone differential lin- 
eage-specific expansions — in the case of HECTs, those de- 
tected in Holozoa, Heterokonta, and maybe Apusozoa. This 
emphasizes the role of E3s as a specific and functionally spe- 
cialized step of the ubiquitination pathway. 

Conclusions 

Our genomic survey and phylogenetic analysis classifies eu- 
karyotic HECTs in six main classes, whose constituent proteins 
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probably descend from six ancestral proteins present in the 
LECA, assuming the "Unikont-Bikont" hypothesis for the 
rooting of the eukaryote phylogeny. These six classes include 
35 identified protein families, as well as other proteins that 
cannot be classified with certainty. 

We also show that, because the eukaryotic ancestor, the 
HECT system has increased its functional complexity and ca- 
pacity to finely tune posttranslational protein regulation in 
several clades, especially — but not exclusively — in multicellular 
organisms. The system has also been simplified in other clades 
such as unicellular red algae. 

The current diversity of the HECT system has been acquired 
through two parallel mechanisms: 1) the acquisition of new 
HECT families through protein duplication, and 2) the acqui- 
sition, by domain shuffling, of new protein domains that spe- 
cifically recognize E3 substrates. We identified a positive 
correlation between the degree of domain diversification 
and the number of HECT proteins present in each genome. 

Our analysis reveals that this domain syntax of HECT pro- 
teins is highly conserved across all eukaryotes: domain fusions 
always occur at the N-terminus of the proteins. This would be 
largely due to the physical constraints to catalytic activity im- 
posed by the HECT proteins tertiary structure. 

The HECT toolkit evolved in a largely independent manner 
in different eukaryote clades, often converging in similar 
domain architectures. Some taxa such as Holozoa are HECT- 
rich, with many HECT types and various domain arrange- 
ments, whereas other taxa such as fungi, plants, and green 
and red algae have HECT-poor genomes. Regarding the evo- 
lution of Holozoa, this study reveals that the onset of new 
families and new protein recognition motifs typically pre- 
date the emergence of animal multicellularity. However, ani- 
mals further increased their HECT regulatory toolkit from their 
unicellular ancestor with six new HECT families. 

Overall, we show a complex evolutionary scenario in which 
the HECT system has evolved toward different degrees of di- 
versification in different clades, through family diversification 
and domain shuffling. Our genomic survey of HECT proteins 
clarifies the origin and evolution of different HECT families 
among eukaryotes and also represents a useful evolutionary 
framework for analyzing this important posttranslational reg- 
ulatory mechanism. 

Supplementary Material 

Supplementary figures S1-S3 and tables S1 and S2 are avail- 
able at Genome Biology and Evolution online (http:/A/vww. 
gbe.oxfordjournals.org/). 
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